Skip to content

Conversation

miguelgrinberg
Copy link
Contributor

@miguelgrinberg miguelgrinberg commented Sep 22, 2025

This change adds a few features that support the use of Pydantic models with the DSL module, instead of the standard models defined as subclasses of the AsyncDocument class.

As part of this work some additions have been made to the typing implementation of DSL documents.

  • Support for the Annotated syntax when defining document fields in the DSL module. Examples:
class TypedDocAnnotated(AsyncDocument):
    ip: Annotated[Optional[str], field.Ip()]
    k1: Annotated[str, field.Keyword(required=True)]
    k2: Annotated[M[str], field.Keyword()]
    k3: Annotated[str, mapped_field(field.Keyword(), default="foo")]
  • Option to exclude a class variable from the list of attributes used to create the ES mapping:
class Doc(AsyncDocument):
    some_var: str = mapped_field(exclude=True)
  • New BaseESModel and AsyncBaseESModel classes that inherit from Pydantic's BaseModel and add Elasticsearch superpowers. In particular, any model defined with one of these as its base class will have meta and _doc private attributes and to_doc() and from_doc() methods. The meta attribute includes metadata for each document, things such as id or score. The _doc attribute is a dynamically generated Document or AsyncDocument instance that can be used whenever access to the Elasticsearch index is needed. The methods convert between Pydantic models and ES documents.

    Aside from the extra attributes, this class works exactly like BaseModel and can be used to define data attributes and their validation rules, and the ES document is derived from them automatically. In particular, this class can be used in FastAPI routes, as shown in the quotes example included in this PR. Any annotations intended for the DSL module can be included in the Annotated[] type hint of the respective fields. The Index inner class can be included as well.

class Quote(BaseESModel):
    quote: str
    author: Annotated[str, dsl.Keyword()]
    tags: Annotated[list[str], dsl.Keyword()]
    embedding: Annotated[list[float], dsl.DenseVector()] = Field(init=False, default=[])

    class Index:
        name = 'quotes'

@miguelgrinberg miguelgrinberg force-pushed the dsl-support-annotated-syntax branch 4 times, most recently from 4ae0574 to 891205c Compare September 23, 2025 18:49
@miguelgrinberg miguelgrinberg force-pushed the dsl-support-annotated-syntax branch from 1f3f66c to b9ada0f Compare September 24, 2025 11:59
@miguelgrinberg miguelgrinberg changed the title Support Annotated typing hint Pydantic integration Sep 24, 2025
@miguelgrinberg miguelgrinberg force-pushed the dsl-support-annotated-syntax branch from f9ddddc to c552171 Compare September 24, 2025 18:21
@miguelgrinberg miguelgrinberg force-pushed the dsl-support-annotated-syntax branch 2 times, most recently from 2d90d7a to 5f41106 Compare September 25, 2025 15:30
@miguelgrinberg miguelgrinberg force-pushed the dsl-support-annotated-syntax branch 2 times, most recently from 112b8d4 to 51be343 Compare September 25, 2025 19:15
@miguelgrinberg miguelgrinberg force-pushed the dsl-support-annotated-syntax branch from 51be343 to 1a4822c Compare September 25, 2025 19:34
Copy link

github-actions bot commented Sep 26, 2025

🔍 Preview links for changed docs

Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM. I only have comments on the example app. I only skimmed the frontend code.

name = "quotes"
version = "0.1"
dependencies = [
"elasticsearch[async]>=8,<9",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"elasticsearch[async]>=8,<9",
"elasticsearch[async]>=9,<10",

Are you planning to backport this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I created the project on which this example is based for a talk I gave last year, so the versions are out of date. I will refresh.

Comment on lines +56 to +63
doc = None
try:
doc = await Quote._doc.get(id)
except NotFoundError:
pass
if not doc:
raise HTTPException(status_code=404, detail="Item not found")
return Quote.from_doc(doc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this code the same?

Suggested change
doc = None
try:
doc = await Quote._doc.get(id)
except NotFoundError:
pass
if not doc:
raise HTTPException(status_code=404, detail="Item not found")
return Quote.from_doc(doc)
try:
doc = await Quote._doc.get(id)
return Quote.from_doc(doc)
except NotFoundError:
raise HTTPException(status_code=404, detail="Item not found")

This also applies to get_quote and delete_quote.

async def create_quote(req: Quote) -> Quote:
embed_quotes([req])
doc = req.to_doc()
doc.meta.id = ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

if req.query == '':
s = s.query(dsl.query.MatchAll())
elif req.knn:
s = s.query(dsl.query.Knn(field=Quote._doc.embedding, query_vector=model.encode(req.query).tolist()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Splitting this into two lines could help with readability:

Suggested change
s = s.query(dsl.query.Knn(field=Quote._doc.embedding, query_vector=model.encode(req.query).tolist()))
query_vector = model.encode(req.query).tolist()
s = s.query(dsl.query.Knn(field=Quote._doc.embedding, query_vector=query_vector))

Quotes database example, which demonstrates the Elasticsearch integration with
Pydantic models. This example features a React frontend and a FastAPI back end.

![Quotes app screenshot](screenshot.png)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great quotes! Do you maybe have an example that does rely on embeddings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you mean here. This example uses embeddings, both on their own and combined with BM25 search.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, the results would have been the same with BM25 only

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe there is something else than "dogs and books" that we can use. That's just a nitpick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, you are talking about the screenshot. In fact, "dogs and books" does not match the Groucho Marx quote at the top when using BM25, because that quote has "dog" and "book" in it in singular. I have to test this, but maybe using "canine" instead of dog makes the example more clear and returns the same results.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this folder? This looks like a create-react-app artifact that is not actually needed here.

Copy link
Contributor Author

@miguelgrinberg miguelgrinberg Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. This project was initially created with create-react-app, and I now refreshed it and moved it to Vite. The public folder doesn't serve any purpose in this example, but I'm guessing Vite expects it to be there, since it serves static files off of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants